This is an attempt at a streamlined and yet complete, relatively a priori/non- snooped model analysis. Note that this version is an overhaul, with some missing components from the older results: see MixedEffects_older.Rmd …
Some parts of the analysis have been moved into separate .R files: the overall workflow is
(mmd_procdata is out of date: Enric has processing machinery to extract lat/long of centroid of largest polygon (?) (x/y in the data set) and area …)
source("mmd_utils.R")
source("gamm4_utils.R")
Packages used/versions:
load_all_pkgs()
theme_set(theme_bw())
zmargin <- theme(panel.spacing=grid::unit(0,"lines"))
zmarginx <- theme(panel.spacing.x=grid::unit(0,"lines"))
library('pander') ## for tables
sapply(pkgList,function(x) format(packageVersion(x)))
## lme4 gamm4 bbmle broom broom.mixed
## "1.1.16" "0.2.5" "1.0.20" "0.4.3" "0.0.1"
## lattice gridExtra ggplot2 plotly ggstance
## "0.20.35" "2.3" "2.2.1.9000" "4.7.1" "0.3"
## dplyr tidyr tibble remef r2glmm
## "0.7.4" "0.7.2" "1.4.2" "1.0.6.9000" "0.1.2"
Note that you now need the development version of lme4, i.e. devtools::install_github("lme4/lme4").
Data from Enric (includes area and lat/long coordinates):
L <- load("ecoreg.RData")
Limit all observations with all predictor variables > 0 and no biome 98/99; set biome as factor. Log and scale/center variables as appropriate. This leaves us with 29 variables and 620 observations.
This can now be done semi-automatically (i.e., fit all combinations of random effects) by using the function fit_all from the mmd_utils.R file (sourced above), e.g. fit_all(response="mbirds_log")
The mmd_fitbatch.R code has already run all random-effects model combinations for all four response variables, both with lme4 and with gamm4. mmd_reduce.R has reduced these to lists with components sum (summaries: AIC, singularity, etc.); coef (coefficients); and pred (fitted, residuals, etc.).
Load data:
load("allfits_sum_lmer.RData") ## 4 taxa x 27 fits, using lmer
load("allfits_sum_gamm4.RData") ## 4 taxa x 27 fits, using gamm4
Table of models with AIC<10:
aictab1 <- lme4_res$sum %>%
filter(AIC<8) %>%
arrange(taxon,AIC) %>%
mutate(AIC=round(AIC,1),
model=shorten_modelname(model))
pander(select(aictab1,-best),emphasize.strong.rows=which(aictab1$best))
| taxon | model | AIC | singular | df |
|---|---|---|---|---|
| mamph_log | b=i/fr=i/b_FR=d | 0 | FALSE | 20 |
| mamph_log | b=d/fr=i/b_FR=i | 3 | TRUE | 20 |
| mamph_log | b=d/fr=i/b_FR=d | 3.9 | TRUE | 24 |
| mamph_log | b=d/fr=d/b_FR=i | 5.3 | TRUE | 24 |
| mamph_log | b=i/fr=d/b_FR=d | 5.5 | TRUE | 24 |
| mamph_log | b=i/fr=i/b_FR=f | 7.2 | FALSE | 30 |
| mbirds_log | b=i/fr=i/b_FR=d | 0 | FALSE | 20 |
| mbirds_log | b=i/fr=d/b_FR=d | 4.6 | TRUE | 24 |
| mbirds_log | b=i/fr=d/b_FR=f | 5.7 | TRUE | 34 |
| mbirds_log | b=i/fr=i/b_FR=f | 6.2 | TRUE | 30 |
| mbirds_log | b=d/fr=i/b_FR=d | 7.2 | TRUE | 24 |
| mmamm_log | b=i/fr=d/b_FR=f | 0 | TRUE | 34 |
| mmamm_log | b=i/fr=i/b_FR=f | 3.3 | TRUE | 30 |
| mmamm_log | b=d/fr=d/b_FR=f | 6.7 | TRUE | 38 |
| mmamm_log | b=i/fr=f/b_FR=f | 7.4 | TRUE | 44 |
| mmamm_log | b=i/fr=d/b_FR=d | 7.8 | FALSE | 24 |
| plants_log | b=i/fr=i/b_FR=d | 0 | FALSE | 20 |
| plants_log | b=d/fr=i/b_FR=i | 3 | TRUE | 20 |
| plants_log | b=d/fr=i/b_FR=d | 3.3 | TRUE | 24 |
| plants_log | b=i/fr=i/b_FR=f | 4.2 | TRUE | 30 |
| plants_log | b=i/fr=d/b_FR=d | 6.3 | TRUE | 24 |
| plants_log | b=d/fr=i/b_FR=f | 7.8 | TRUE | 34 |
Best (non-singular) models only:
lme4_best_sum <- lme4_res$sum %>%
filter(best) %>% select(-c(best,singular))
gamm4_best_sum <- gamm4_res$sum %>%
filter(best) %>% select(-c(best,singular))
all_best_sum <- bind_rows(list(lme4=lme4_best_sum,gamm4=gamm4_best_sum),
.id="type")
pander(all_best_sum)
| type | taxon | model | AIC | df |
|---|---|---|---|---|
| lme4 | plants_log | biome=int/flor_realms=int/biome_FR=diag | 0 | 20 |
| lme4 | mamph_log | biome=int/flor_realms=int/biome_FR=diag | 0 | 20 |
| lme4 | mbirds_log | biome=int/flor_realms=int/biome_FR=diag | 0 | 20 |
| lme4 | mmamm_log | biome=int/flor_realms=diag/biome_FR=diag | 7.835 | 24 |
| gamm4 | plants_log | biome=int/flor_realms=diag/biome_FR=int | 9.79 | 21 |
| gamm4 | mamph_log | biome=int/flor_realms=int/biome_FR=int | 22.63 | 17 |
| gamm4 | mmamm_log | biome=int/flor_realms=diag/biome_FR=int | 25.77 | 21 |
| gamm4 | mbirds_log | biome=int/flor_realms=int/biome_FR=int | 36.9 | 17 |
Our current strategy is to (1) take the best non-singular model fitted by lme4; (2) find the coefficients of the corresponding gamm4 model. (Below, we also try taking the best non-singular gamm4 model.)
lme4_best_models <- select(lme4_best_sum,c(taxon,model))
gamm4_best_models <- select(gamm4_best_sum,c(taxon,model))
## keep only predictions from best models
gamm4_best_pred <- gamm4_res$pred %>%
right_join(lme4_best_models,by=c("taxon","model"))
Fitted vs residual for all four taxa:
ggplot(gamm4_best_pred,aes(.fitted,.resid))+
geom_point()+geom_smooth()+
facet_wrap(~taxon,nrow=1)+zmargin
A little bit heavy-tailed …
## https://stackoverflow.com/questions/40598011/how-to-customize-hover-information-in-ggplotly-object
ggqq <- ggplot(gamm4_best_pred,aes(sample=.resid,
colour=biome,
flor_realms=flor_realms))+
stat_qq()+stat_qq_line(aes(group=taxon))+
facet_wrap(~taxon,nrow=1)+zmargin
## print(ggqq)
ggplotly(ggqq,tooltip=c("biome","flor_realms"))
For example:
gamm4_allcoef <- get_allcoefs(gamm4_res,"plants_log") %>% add_wald_ci
lme4_allcoef <- get_allcoefs(lme4_res,"plants_log") %>% add_wald_ci
gg_allcoef <- ggplot(gamm4_allcoef,aes(estimate,model,colour=singular,shape=singular))+
## use point + linerange because some RE are missing std.errors
geom_point()+
geom_linerangeh(aes(xmin=lwr,xmax=upr))+
facet_wrap(~term,scale="free_x")+
geom_vline(xintercept=0,lty=2)+
scale_colour_brewer(palette="Set1")+
zmarginx
All gamm4 coefs for plants:
print(gg_allcoef)
all lme4 coefs for plants:
print(gg_allcoef %+% lme4_allcoef)
to do: why so many singular now … ?
to do: redo profiling/profile plots, comparison of Wald/profile CIs (at least for best models …
Check difference between Wald and likelihood profile confidence intervals. In principle profile CIs are more accurate - if the computations have run reliably … but I would probably be conservative and take the wider of the two.
Pull out coefs from best models for lme4, gamm4 (from lme4-best model), gamm4 (from gamm4-best model)
gamm4_best_coef <- gamm4_res$coef %>%
right_join(lme4_best_models,by=c("taxon","model")) %>%
add_wald_ci %>% drop_intercept
gamm4_best2_coef <- gamm4_res$coef %>%
right_join(gamm4_best_models,by=c("taxon","model")) %>%
add_wald_ci %>% drop_intercept
lme4_best_coef <- lme4_res$coef %>%
right_join(lme4_best_models,by=c("taxon","model")) %>%
add_wald_ci %>% drop_intercept
all_best_coef <- bind_rows(list(lme4=lme4_best_coef,gamm4=gamm4_best_coef,
gamm4_2=gamm4_best2_coef),
.id="type")
to do: plot all coefficients including std devs (need to combine group with term for random effects)
Exclude random effects and NPP_log (which is large and significant for all taxa). Abuse warning: coloring significant effects. Snooping warning: counting number of sig values for each model type!
There are not a lot of effects other than NPP_log that are consistently large and significant; perhaps main effects of fire for birds and mammals. Amphibians have considerably larger effects (the last three: Feat cv and its interactions).
Load best non-singular gamm4 fits:
load("bestmodels_gamm4.RData")
names(best_models)
## [1] "plants_log" "mamph_log" "mbirds_log" "mmamm_log"
plotfun() takes arguments:
model: fitted modelxvar (“NPP_log”): x-variableauxvar (“Feat_cv_sv”): auxiliary variable (e.g. for examining interactions)respvar (equal to model response by default): response variableaux_quantiles: (0.1, 0.5, 0.9) quantiles of auxiliary variable to predictpred_lower_lim (-3) : lower cut off values (log scale)data (ecoreg)re.form (NA) which RE to include in predictions (default is none)ggplot1 <- plotfun(best_models[[1]])
print(ggplot1)
or via ggplotly():
ggplotly(ggplot1)
Partial residuals (not working yet):
fix_NAs <- function(rem,model) {
if (!is.null(nastuff <- attr(model.frame(model),"na.action"))) {
return(napredict(nastuff,rem))
} else return(rem)
}
rem1 <- fix_NAs(remef(best_models[[1]]$mer,ran="all"),best_models[[1]])
if (length(rem1)==nrow(ecoreg)) {
## if it worked ...
ecoreg$rem1 <- rem1
## update previous plot:
plotfun(best_model,respvar="rem1")
}